Flexible optimized data handling in systems with multiple memories

ABSTRACT

Methods and systems for optimizing an application for a computing system having multiple distinct memory locations that are interconnected by one or more communication channels include determining one or more data handling properties for a data region in an application. One or more data handling policies for the data region are determined based on the one or more data handling properties. Data setup costs are determined for a scope in the application that uses the data region in different memory locations based on the one or more data handling properties. The application is optimized in accordance with the one or more data handling policies and the data setup costs for the different memory locations.

This invention was made with Government support under Contract No.B604142 awarded by Department of Energy. The Government has certainrights in this invention.

BACKGROUND Technical Field

The present invention generally relates to data management and, moreparticularly, to the optimization of memory location and memory accesschannels.

Description of the Related Art

Modern computing systems may have multiple different memories andstorage locations available. This is possible on many scales, includingfor example multiple memories within a single device, multipledistributed computing systems that each have local memories, cloudcomputing systems, etc. When executing software that has access tomultiple memories, decisions as to where to store particular data andhow to communicate said data to the appropriate location are determinedeither automatically or by hand.

In one conventional approach, low-level programming technologies such asmessage passing interface (MPI) have the programmer manually determinememory storage locations and communication methods. However, thisprocess is error-prone and difficult to optimize as systems becomecomplex.

Automatic systems are also available, where little programmer input isneeded. However, such systems provide generic solutions that may bepoorly tuned to the specific application and may have unnecessarily highoverheads. Semi-automatic systems control data movement throughhigh-level programmer directives, but this only exploits information onwhat data regions are read or written at specific points in theapplication and does not work well for data regions that havefine-grained, irregular accesses. In the worst case, the semi-automaticsystems devolve to the low-level approach when using recursive,pointer-based data structures.

SUMMARY

A method for optimizing an application for a computing system havingmultiple distinct memory locations that are interconnected by one ormore communication channels includes determining one or more datahandling properties for a data region in an application. One or moredata handling policies for the data region are determined based on theone or more data handling properties. Data setup costs are determinedfor a scope in the application that uses the data region in differentmemory locations based on the one or more data handling properties. Theapplication is optimized in accordance with the one or more datahandling policies and the data setup costs for the different memorylocations.

A method for optimizing an application for a computing system havingmultiple distinct memory locations that are interconnected by one ormore communication channels includes determining one or more datahandling properties for a data region in an application. One or moredata handling policies are determined for the data region based on theone or more data handling properties. Data setup costs are determinedfor a scope in the application that uses the data region in differentmemory locations based on the one or more data handling properties. Theapplication is optimized in accordance with the one or more datahandling policies and the data setup costs for the different memorylocations. Optimizing includes selecting one or more memory locations inwhich to store the data region and selecting one or more communicationchannels by which the data region is transferred between memorylocations.

A system for optimizing an application for computing a system havingmultiple distinct memory locations that are interconnected by one ormore communication channels includes a compiler module that has aprocessor configured to determine one or more data handling propertiesfor a data region in an application, to determine one or more datahandling policies for the data region based on the one or more datahandling properties, to determine data setup costs for a scope in theapplication that uses the data region in different memory locationsbased on the one or more data handling properties, and to optimizing theapplication in accordance with the one or more data handling policiesand the data setup costs for the different memory locations.

These and other features and advantages will become apparent from thefollowing detailed description of illustrative embodiments thereof,which is to be read in connection with the accompanying drawings.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

The disclosure will provide details in the following description ofpreferred embodiments with reference to the following figures wherein:

FIG. 1 is a block diagram of a computing system having multiple memoriesin accordance with the present principles;

FIG. 2 is a block/flow diagram of a method for selecting data handlingpolicies based on data properties in accordance with the presentprinciples;

FIG. 3 is a block/flow diagram of a method for optimizing applicationcode based on data properties and data handling policies in accordancewith the present principles;

FIG. 4 is a block diagram of an optimization system in accordance withthe present principles;

FIG. 5 is a block diagram of a processing system in accordance with thepresent principles;

FIG. 6 is a diagram of a cloud computing environment according to thepresent principles; and

FIG. 7 is a diagram of abstraction model layers according to the presentprinciples.

DETAILED DESCRIPTION

Embodiments of the present invention automate control of data movementusing hints from the programmer that are centered on program data. Thesehints specify properties about the data that substantially aid insubsequent automated optimization and lead to specific policies for datahandling.

It is to be understood in advance that, although this disclosureincludes a detailed description on cloud computing, implementation ofthe teachings recited herein are not limited to a cloud computingenvironment. Rather, embodiments of the present invention are capable ofbeing implemented in conjunction with any other type of computingenvironment now known or later developed.

Referring now to FIG. 1 , a generalized computing system 100 is shownwith multiple memories. The computing system 100 includes multiplesystem nodes 102, each with respective processing resources 104 andmemory resources 106. It should be understood that the processingresources 104 may include one or more hardware processors and that thememory resources 106 may include one or more banks of storage of anysuitable type, as discussed in greater detail below.

The system nodes 102 communicate with one another over one of severalcommunications channels 108. The communications channels 108 may be anyappropriate form of data communication system including, e.g., anin-system bus, a wired connection, a wireless connection, a connectionthrough the internet, etc. Each system node 102 includes one or morememory controllers 110 that receive requests for data, retrieve therequested data from memory resources 106, and communicate the requesteddata to the requesting node via one or more communications channels 108.The memory controllers 110 also handle replication of data to othernodes 102 to, e.g., increase system performance by creating local copiesat nodes 102 that will need them.

It should be noted that the communication channels 108 include bothhardware and software aspects. Hardware mechanisms refer to thephysically available communication paths which may be directly exposedusing, for example, an application programming interface (API) call bythe software. Two memories may have multiple physical connectionsbetween them in the form of direct physical interconnects, but there mayalso be indirect physical connections through, e.g., the data flow froma first memory resource 106, through processing resource 104, to asecond memory resource. Software communication mechanisms may be builton top of the hardware mechanisms and may provide additional services orenforce some policy. The hardware mechanisms and the software mechanismstogether make up the communication channels 108.

How the memory controllers 110 determine where data is stored and how tocommunicate data to requesting nodes is determined by a software programrunning on the processing resources 102. Each software program will havedifferent needs for data management and therefore will have differentoptimal data handling policies that correspond to the specific types ofdata that are being used.

To accommodate these different possibilities, the software program isoptimized at compile-time and at runtime in accordance with knownproperties of the data at issue. Data handling policies are determinedat compile-time, but the code may further be compiled with runtime callsthat help facilitate the implementation of the policies. Runtime callsmay include application programing interface (API) calls for thesoftware communication mechanisms or they may be calls to query thestate of runtime system resources or program data to facilitate thechoosing of an execution path.

The data properties may be set explicitly by the programmer or they maybe discovered automatically. Data properties that may be used includethe size of a data region for data in a given scope,read/write/read-write access status, coverage information, accessfrequency information, and data layout information. In particular,coverage information refers to how many elements of a data region areaccessed (e.g., few, all, most, or some fixed or variable percentage ofthe data region size), access frequency information refers to how oftenthe data region is accessed (e.g., once, rarely, at regular intervals,in bursts, etc.), and data layout information refers to whether accessesare, e.g., streaming, random, or strided. Other criteria that may beconsidered during optimization are the time it takes to transfer data,the amount and frequency of data to be transferred, the overhead ofcopying data, the cost of maintaining coherence and consistency, powerand bandwidth constraints for the system 100, andcomputation-to-communication ratio and overlap.

In one specific embodiment, the system 100 is implemented as a cloudcomputing system, with many different nodes 102 that may begeographically quite far from one another. In such systems, the cost oftransferring data from one node 102 to another across communicationchannels 108 may be quite high relative to the cost of performing theassociated computations on that data. In such a case, the utility ofoptimizing data storage and communication channels is clear.

Referring now to FIG. 2 , a method of selecting policies based on dataproperties is shown. Block 202 modifies the source code of anapplication that is intended to run on the system 100 to specify dataproperties for data regions. It is specifically contemplated that block202 may be performed manually by the programmer and includes the entryof explicit instructions that specify, for example, one of theproperties described above. Block 204 then automatically detectsproperties for the data regions of the application using, for example,static analysis and dynamic profiling, to fill in as many gaps in theproperty definitions as possible. For any data regions that haveproperties that remain unassigned, block 206 sets default values.

Block 208 analyzes the properties for each data region across all scopesof the application. As used herein, the term “scope” refers to a sectionof the application code, which may be demarcated based on syntacticstructures in the code or based on the sequence of instructions to beexecuted. The analysis will depend on the specific configuration of thesystem 100 and the needs of the application, but some examples are setforth below. In general, block 208 attempts to optimize one or moresystem metrics (for example, the speed of the application, power/energyefficiency, or bandwidth utilization) by determining where to store dataregions and what communications channels (both hardware and softwaremechanisms) to use to transfer those data regions. This analysis mayconsider both hardware and software limitations in view of theapplication's needs. Based on the analysis, block 210 selects datahandling properties, including setting initial data placement, for eachdata region.

Any of several different data handling policies may apply to a givendata region in a given scope in the software program. In general, thepolicies may be simple (e.g., a selection between two differentcommunication channels) or may be more complex (e.g., select X if thedata is located at A, or select Y otherwise).

A first exemplary policy is a choice between differentcoherence/consistency options. By way of example, a system configurationmay include three distinct memory locations, A, B, and C, with coherencesupported efficiently in hardware across A and B, but not in C.Coherence for C is handled in software with a high overhead. Theapplication code for an exemplary piece of software relies onsystem-level coherence and has three scopes, X, Y, and Z, that may beexecuted in parallel, all of which access the same data region. Thisdata region is copied into the local memories A, B, and C beforecomputation. Then, if X, Y, and Z all write to all elements of the dataregion, then (X,Y,Z) may be mapped to execute on (A,B,C) in any order,with no software coherence enabled on C. However, if X may not write toall of the elements of the data region, then mapping X to C wouldnecessitate software coherence handling at high cost. In this example,selection of a policy would consider these properties to calculate costsand pick computing locations to prevent X from being mapped to C. Thispolicy thereby selects between a communication channel 108 that supportscoherence and one that does not.

A second exemplary policy is a choice between different software-enabledcommunication mechanisms (e.g., a selection of software communicationchannels 108). An exemplary system may have a host processor and memoryas well as a separate accelerator processor and memory, where theinterconnect between the two memories is bandwidth constrained, taking afixed time to transfer a small amount of data but a longer amount oftime to transfer amounts larger than some threshold. Two exemplarysoftware libraries implement data transfers—one that eagerly pushes data(bulk transfers) and another that lazily pulls data (multiplefine-grained transfers). In one exemplary application, the softwaremakes accesses to random elements of a large data region. If there arefew elements accessed, then the time latency of multiple small transfersmay be less than the time needed to transfer the entire data region. Inthis example, selection of a policy would consider the access frequencyproperties to weigh the costs and the benefits of the differentpolicies. The eager and lazy versions of the software library representdifferent software mechanisms that characterize distinct communicationchannels 108.

A third exemplary policy is a choice between different hardware-enabledmechanisms (e.g., a selection of hardware mechanisms). As above, theexemplary system has a host processor and memory and an acceleratorprocessor and memory, where the hardware interconnect allows theaccelerator to directly access both host memory and accelerator memory(in other words, the accelerator supports load/store instructions usingaddresses that map to the host memory as well as addresses that map tothe accelerator memory). In this case, data that is rarely accessed onthe accelerator need not be copied over to its local memory, whereasdata that is frequently accessed should still be copied over to improveperformance. Selection of a policy would therefore consider propertiesdefining how frequently the data is to be accessed. In this example,directly accessing the host memory and copying the data to local memoryrepresent different communication channels 108.

A fourth exemplary policy is to push data to a next location if data iswritten exactly once in a scope. An exemplary system may include a hostprocessor and memory as well as a separate accelerator processor andmemory, where the interconnect between the two memories has a hightransfer latency. There may in addition be two exemplary softwarelibraries available to this exemplary system that may be used toimplement data transfers—a first library eagerly pushes data to otherlocations after a write access by copying the data and a second librarylazily pulls data from the location where the data was last updated on aread access. If the software writes exactly once to elements in a dataregion on the host and then reads multiple times on the accelerator,then it is more efficient to use the eager push library for transferringelements of the data region across the interconnect, because therelatively high write cost will be outweighed by read savings. The eagerand lazy versions of the software library represent different softwaremechanisms that characterize distinct communication channels 108.

A fifth exemplary policy guides placement of data and computations toavoid remote accesses over communications channels 108. The dataproperties determined per data region can be used to automatically applydata affinity optimizations (e.g., placing data close to the computelocation where it will be accessed). An exemplary application may have alarge data region that is accessed in a parallel code section. If thedata will be accessed in a regular pattern (e.g., streaming or strideddata), the data region can be partitioned and placed in multiplememories. Then the compute locations for the parallel code sections canbe selected such that they are physically close to the memory that holdsthe data region partition corresponding to the data accessed by thecode. This can help reduce or eliminate data transfers across thecommunication channels 108.

Referring now to FIG. 3 , a method of optimizing an application's codeis shown. Block 302 selects a new scope from among the different scopesin the application. Block 304 gathers properties for each data regionaccessed in the scope. Block 304 may re-use the same data propertiesdetermined by the process of FIG. 2 . Block 306 then performs compileranalysis, augmented by this information. Compiler analysis may include,for example, analyses such as control and data flow analysis, aliasanalysis, and dependence analysis, that help determine the set of dataelements accessed in the code and the access patterns.

Block 308 calculates data setup costs for each of the differentcomputing locations available at the system 100. For each location,information about the set of data elements accessed in the code, theaccess patterns, and the data handling policies is used to determine thenumber, size, and direction of data transfers that will be needed if thescope is executed at that computing location. Then, the cost of all ofthe data transfers can be estimated for the communication channel(s) 108selected by the data handling policies. An applicable cost metric caninclude any subset of the system parameters being optimized forincluding, e.g., execution time, power/energy efficiency, and/orbandwidth usage.

Block 310 selects a computing location for the scope based on thecalculated costs. The computing location is selected according to one ormore needs in the application. For example, some computing locations maylack features that the application needs, or may have a higher costassociated with those features, such that a different computing locationmay be selected. In another example, the cost may characterize the powerconsumption of processing, with processing in some locations incurring ahigher power cost. Block 312 then applies the data handling policies tothe scope at the relevant computing location(s). For each data region inthe scope, the communication channel 108 is selected based on the datahandling policy for the data region and the communication channels 108available for that location. The selected communication channel 108 isused for all data transfers in the scope that correspond to elements ofthat data region.

Block 314 determines whether there are any additional scopes in theapplication that have not been handled yet. If so, processing returns toblock 302 where a new scope is selected. If not, block 316 generatesoptimized code using the selected computing location(s) and the datahandling policies. This optimized code takes into account the needs ofthe application across scopes and in a manner that is closely based onthe properties of the data regions involved, without necessitatingexplicit placement by the programmer. As a result, when the code isexecuted in block 318, the application runs with better performance.

The optimization criteria that are used to determine the best memorylocations for data regions will depend on the specific application andsystem parameters being used. Optimization criteria (i.e., goals to beachieved by the optimization process) may include, for example,improvements in data transfer time, amount/frequency of data beingtransferred, overhead of data copying, or cost of maintainingcoherence/consistency or meeting power/bandwidth constraints in thesystem.

The present invention may be a system, a method, and/or a computerprogram product. The computer program product may include a computerreadable storage medium (or media) having computer readable programinstructions thereon for causing a processor to carry out aspects of thepresent invention.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, or either source code or object code written in anycombination of one or more programming languages, including an objectoriented programming language such as Smalltalk, C++ or the like, andconventional procedural programming languages, such as the “C”programming language or similar programming languages. The computerreadable program instructions may execute entirely on the user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).In some embodiments, electronic circuitry including, for example,programmable logic circuitry, field-programmable gate arrays (FPGA), orprogrammable logic arrays (PLA) may execute the computer readableprogram instructions by utilizing state information of the computerreadable program instructions to personalize the electronic circuitry,in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the blocks may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

Reference in the specification to “one embodiment” or “an embodiment” ofthe present principles, as well as other variations thereof, means thata particular feature, structure, characteristic, and so forth describedin connection with the embodiment is included in at least one embodimentof the present principles. Thus, the appearances of the phrase “in oneembodiment” or “in an embodiment”, as well any other variations,appearing in various places throughout the specification are notnecessarily all referring to the same embodiment.

It is to be appreciated that the use of any of the following “/”,“and/or”, and “at least one of”, for example, in the cases of “A/B”, “Aand/or B” and “at least one of A and B”, is intended to encompass theselection of the first listed option (A) only, or the selection of thesecond listed option (B) only, or the selection of both options (A andB). As a further example, in the cases of “A, B, and/or C” and “at leastone of A, B, and C”, such phrasing is intended to encompass theselection of the first listed option (A) only, or the selection of thesecond listed option (B) only, or the selection of the third listedoption (C) only, or the selection of the first and the second listedoptions (A and B) only, or the selection of the first and third listedoptions (A and C) only, or the selection of the second and third listedoptions (B and C) only, or the selection of all three options (A and Band C). This may be extended, as readily apparent by one of ordinaryskill in this and related arts, for as many items listed.

Referring now to FIG. 4 , an optimization system 400 is shown. Thesystem 400 includes a hardware processor 402 and memory 404. The system400 may further include one or more functional modules. The functionalmodules may be implemented as software that is stored in the memory 404and executed by the hardware processor 402. In alternative embodiments,the functional modules may be implemented as one or more discretehardware components in the form of, e.g., application specificintegrated chips or field programmable gate arrays.

A developer environment 406 runs on the system 400 and allows aprogrammer to make changes to source code 408, which is stored in thememory 404. The developer environment 406 provides the ability tomanually specify properties 410 for data regions across various scopesof the source code 408. A compiler module 414 uses the data propertiesand system description 412 to select data handling policies to apply tothe source code and selects computing locations for each data region.The compiler module 414 then outputs a compiled application forexecution on a system 100 having multiple computing and memorylocations.

Referring now to FIG. 5 , an exemplary processing system 500 is shownwhich may represent the optimizing system 400. The processing system 500includes at least one processor (CPU) 504 operatively coupled to othercomponents via a system bus 502. A cache 506, a Read Only Memory (ROM)508, a Random Access Memory (RAM) 510, an input/output (I/O) adapter520, a sound adapter 530, a network adapter 540, a user interfaceadapter 550, and a display adapter 560, are operatively coupled to thesystem bus 502.

A first storage device 522 and a second storage device 524 areoperatively coupled to system bus 502 by the I/O adapter 520. Thestorage devices 522 and 524 can be any of a disk storage device (e.g., amagnetic or optical disk storage device), a solid state magnetic device,and so forth. The storage devices 522 and 524 can be the same type ofstorage device or different types of storage devices.

A speaker 532 is operatively coupled to system bus 502 by the soundadapter 530. A transceiver 542 is operatively coupled to system bus 502by network adapter 540. A display device 562 is operatively coupled tosystem bus 502 by display adapter 560.

A first user input device 552, a second user input device 554, and athird user input device 556 are operatively coupled to system bus 502 byuser interface adapter 550. The user input devices 552, 554, and 556 canbe any of a keyboard, a mouse, a keypad, an image capture device, amotion sensing device, a microphone, a device incorporating thefunctionality of at least two of the preceding devices, and so forth. Ofcourse, other types of input devices can also be used, while maintainingthe spirit of the present principles. The user input devices 552, 554,and 556 can be the same type of user input device or different types ofuser input devices. The user input devices 552, 554, and 556 are used toinput and output information to and from system 500.

Of course, the processing system 500 may also include other elements(not shown), as readily contemplated by one of skill in the art, as wellas omit certain elements. For example, various other input devicesand/or output devices can be included in processing system 500,depending upon the particular implementation of the same, as readilyunderstood by one of ordinary skill in the art. For example, varioustypes of wireless and/or wired input and/or output devices can be used.Moreover, additional processors, controllers, memories, and so forth, invarious configurations can also be utilized as readily appreciated byone of ordinary skill in the art. These and other variations of theprocessing system 500 are readily contemplated by one of ordinary skillin the art given the teachings of the present principles providedherein.

Referring now to FIG. 6 , illustrative cloud computing environment 50 isdepicted. As shown, cloud computing environment 50 comprises one or morecloud computing nodes 10 with which local computing devices used bycloud consumers, such as, for example, personal digital assistant (PDA)or cellular telephone 54A, desktop computer 54B, laptop computer 54C,and/or automobile computer system 54N may communicate. Nodes 10 maycommunicate with one another. They may be grouped (not shown) physicallyor virtually, in one or more networks, such as Private, Community,Public, or Hybrid clouds as described hereinabove, or a combinationthereof. This allows cloud computing environment 50 to offerinfrastructure, platforms and/or software as services for which a cloudconsumer does not need to maintain resources on a local computingdevice. It is understood that the types of computing devices 54A-N shownin FIG. 6 are intended to be illustrative only and that computing nodes10 and cloud computing environment 50 can communicate with any type ofcomputerized device over any type of network and/or network addressableconnection (e.g., using a web browser).

Referring now to FIG. 7 , a set of functional abstraction layersprovided by cloud computing environment 50 (FIG. 6 ) is shown. It shouldbe understood in advance that the components, layers, and functionsshown in FIG. 6 are intended to be illustrative only and embodiments ofthe invention are not limited thereto. As depicted, the following layersand corresponding functions are provided:

Hardware and software layer 60 includes hardware and softwarecomponents. Examples of hardware components include: mainframes 61; RISC(Reduced Instruction Set Computer) architecture based servers 62;servers 63; blade servers 64; storage devices 65; and networks andnetworking components 66. In some embodiments, software componentsinclude network application server software 67 and database software 68.

Virtualization layer 70 provides an abstraction layer from which thefollowing examples of virtual entities may be provided: virtual servers71; virtual storage 72; virtual networks 73, including virtual privatenetworks; virtual applications and operating systems 74; and virtualclients 75.

In one example, management layer 80 may provide the functions describedbelow. Resource provisioning 81 provides dynamic procurement ofcomputing resources and other resources that are utilized to performtasks within the cloud computing environment. Metering and Pricing 82provide cost tracking as resources are utilized within the cloudcomputing environment, and billing or invoicing for consumption of theseresources. In one example, these resources may comprise applicationsoftware licenses. Security provides identity verification for cloudconsumers and tasks, as well as protection for data and other resources.User portal 83 provides access to the cloud computing environment forconsumers and system administrators. Service level management 84provides cloud computing resource allocation and management such thatrequired service levels are met. Service Level Agreement (SLA) planningand fulfillment 85 provide pre-arrangement for, and procurement of,cloud computing resources for which a future requirement is anticipatedin accordance with an SLA.

Workloads layer 90 provides examples of functionality for which thecloud computing environment may be utilized. Examples of workloads andfunctions which may be provided from this layer include: mapping andnavigation 91; software development and lifecycle management 92; virtualclassroom education delivery 93; data analytics processing 94;transaction processing 95; and source code optimization 96.

Having described preferred embodiments of flexible optimized datahandling in systems with multiple memories (which are intended to beillustrative and not limiting), it is noted that modifications andvariations can be made by persons skilled in the art in light of theabove teachings. It is therefore to be understood that changes may bemade in the particular embodiments disclosed which are within the scopeof the invention as outlined by the appended claims. Having thusdescribed aspects of the invention, with the details and particularityrequired by the patent laws, what is claimed and desired protected byLetters Patent is set forth in the appended claims.

What is claimed is:
 1. A computer-implemented method comprising:selecting a scope from a plurality of scopes in the application, eachscope of the plurality of scopes being defined by a section ofapplication code; determining, by a compiler, one or more data handlingproperties of a data region accessed in a selected scope; determining,by the compiler at compile time, one or more data handling policies forthe data region based on the one or more data handling properties,wherein the data handling policies perform data transfers implemented bysoftware libraries pushing and pulling the data; and optimizing, by thecompiler, the application by selecting one or memory locations in whichto store the data region in accordance with at least the determined oneor more data handling properties.
 2. The computer-implemented method ofclaim 1, wherein the section of application code is demarcated bysyntactic structures in the application code.
 3. Thecomputer-implemented method of claim 1, wherein the software librariesinclude a first software library and a second software library.
 4. Thecomputer-implemented method of claim 3, wherein the first softwarelibrary includes a policy to at least push data to a next location ifthe data is written exactly once in the selected scope.
 5. Thecomputer-implemented method of claim 4, wherein the first softwarelibrary pushes the data to other locations after a write access bycopying the data.
 6. The computer-implemented method of claim 3, whereinthe second software library pulls the data from the location where thedata was last updated on a read access.
 7. The computer-implementedmethod of claim 1, further comprising determining, at compile time, oneor more data handling policies for the data region based on the one ormore data handling properties.
 8. The computer-implemented method ofclaim 1, further comprising determining data setup costs for a pluralityof scopes in the application.
 9. The computer-implemented method ofclaim 8, wherein the plurality of scopes use the data region indifferent memory locations based on the one or more data handlingproperties.
 10. The computer-implemented method of claim 1, whereinoptimizing the application includes selecting one or memory locations inwhich to store the data region.
 11. The computer-implemented method ofclaim 1, further comprising selecting one or more communicationschannels by which the data region is transferred between memorylocations.
 12. The computer-implemented method of claim 1, wherein theone or more data handling properties further include coverageinformation.
 13. A computer program product for optimizing anapplication for a computing system having multiple distinct memorylocations, the program instructions executable by a computer to causethe computer to: select a scope from a plurality of scopes in theapplication, each scope of the plurality of scopes being defined by asection of application code; determine, by a compiler, one or more datahandling properties of a data region accessed in a selected scope;determine, by the compiler at compile time, one or more data handlingpolicies for the data region based on the one or more data handlingproperties, wherein the data handling policies perform data transfersimplemented by software libraries pushing and pulling the data; andoptimize, by the compiler, the application by selecting one or memorylocations in which to store the data region in accordance with thedetermined one or more data handling properties.
 14. The computerprogram product of claim 13, wherein the section of application code isdemarcated by syntactic structures in the application code.
 15. Thecomputer program product of claim 13, wherein the software librariesinclude a first software library and a second software library.
 16. Thecomputer program product of claim 15, wherein the first software libraryincludes a policy to at least push data to a next location if the datais written exactly once in the selected scope.
 17. The computer programproduct of claim 16, wherein the first software library pushes the datato other locations after a write access by copying the data.
 18. Thecomputer program product of claim 15, wherein the second softwarelibrary pulls the data from the location where the data was last updatedon a read access.
 19. The computer program product of claim 13, whereinthe one or more data handling properties further include accessfrequency information.
 20. The computer program product of claim 13,wherein the one or more data handling properties further include datalayout information.